
This project focuses on developing a classifier to identify kicks and punches in MMA fights using a dataset consisting of 597 annotated images captured from raw MMA fights, courtesy of Roboflow. Initially, we trained six models (VGG, Resnet50, InceptionResnetV2, MobileNetV2 , EfficientNetB0 , YoloV8) with four image classes: kick, punch, kicknt (kick no touch), and punchnt (punch no touch). However, these models only achieved an average accuracy of 30%, and despite making various attempts at data augmentation and using different models for training, we couldn't improve their performance.
To address this issue, we simplified the problem by removing the kicknt and punchnt classes and retrained the six models with just two classes: kick and punch. This modification resulted in a significant improvement, with an average accuracy increase of 30% across all models.
In an effort to further enhance the performance, we decided to experiment with the YoloV8n classification model. To ensure fair comparisons, we conducted experiments using both the original dataset with four classes and the simplified dataset with two classes. These experiments aimed to identify the most suitable model and approach for accurately classifying kicks and punches in MMA fights.
Among the six models we trained, YOLOV8 model was the best performer with the accuracy of 75% and EfficientNET was the least performer with accuracy of 53.12%. We have also created a Streamlit app for kick and punch classifier using YOLOV8 model.
StreamLit URL: https://kick-and-punch-classifier.streamlit.app/
GitHub repository: https://github.com/jorgeluisgalarraga/kick-and-punch-detection
The MMA industry is undergoing rapid expansion, marked by a global upsurge in viewership and engagement. However, in contrast to other sports, the sports analytics domain within MMA remains nascent, presenting a promising avenue for pioneering solutions. Performance analysis in MMA is intricate, owing to the diverse spectrum of skills involved, often reliant on human expertise. Recognizing this void, we propose an innovative concept: an advanced classifier app engineered to precisely discern and categorize kicks and punches using video and image processing. This app stands to be a transformative tool for MMA contests.
Moreover, this app promises manifold advantages to coaches, referees, and the data driven realm of player statistics. Coaches could harness real-time insights to tailor training regimens, pinpoint strengths, and mitigate weaknesses, ushering in elevated athlete preparation. Referees, grappling with split-second decisions, would gain an unprecedented aid in judging fight dynamics accurately, ensuring fair outcomes. The app's ability to generate comprehensive player statistics fuels data-driven performance evaluation, facilitating informed strategy development and insightful post-fight assessments. As MMA continues its ascent, this app emerges as a game-changer, and reshaping the landscape of sports analytics.
No comparable findings were discovered on Kaggle. While certain applications exist, they diverge from our proposed concept.
We curated a custom dataset from the ground up, immersing ourselves in MMA videos on YouTube and meticulously capturing screenshots. Complementing this, we embarked on a quest for diverse fight images from across the internet. This comprehensive dataset encompasses four distinct classes: "kick," "kicknt" (no touch), "punch," and "punchnt" (no touch).
To expedite the image classification process and optimize dataset division, we harnessed the power of the Roboflow tool. Its capabilities proved instrumental, delivering an exceedingly efficient and user-intuitive solution. The utilization of Roboflow empowered us to systematically arrange, annotate, and categorize the images, rendering it a valuable asset and aiding our project's success.
Tha dataset can be downloaded from: https://universe.roboflow.com/georgebrown/punch-and-kick-detection-group
In order to optimize performance, we explored a range of image classification models, namely VGG, Resnet50, InceptionResnetV2, MobileNetV2 , EfficientNetB0 and YoloV8. The four distinct classes we chose were kick, punch, kicknt (kick no touch), and punchnt (punch no touch). The intricacy of the four classes was compounded by their similarity and it often led the model to confusion. Unexpectedly, the introduction of dropout, a technique aimed at enhancing robustness, paradoxically resulted in reduced model accuracy. Despite employing data augmentation techniques, our progress encountered a barrier, resulting in an average accuracy plateau of 30%.
In order to address this problem, we strategically simplified the classification by eliminating kicknt and punchnt classes, refocusing solely on kick and punch. This recalibration yielded a remarkable transformation, catapulting our average accuracy across all models by an impressive 30%. To elevate performance even further, we embarked on a fresh avenue of experimentation – the YoloV8n classify model. To ensure fair comparisons, we conducted experiments using both the original four-class dataset and the simplified two-class dataset, aiming to find the most optimal solution.
train_datagen = ImageDataGenerator(
rescale=1.0 / 255, # Normalize pixel values to [0,1]
rotation_range=90, # Randomly rotate images by up to 20 degrees
width_shift_range=0.4, # Randomly shift images horizontally by up to 20% of the width
height_shift_range=0.4, # Randomly shift images vertically by up to 20% of the height
shear_range=0.5, # Apply shear transformation
zoom_range=0.2, # Randomly zoom images by up to 20%
horizontal_flip=True, # Randomly flip images horizontally
fill_mode='nearest' # Use the nearest pixel to fill missing areas after augmentation
)
plt.figure(figsize=(10, 10))
images, _ = next(training_set)
for i, image in enumerate(images[: 9]):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(image)
plt.axis('off')
#Load the VGG16 model with pre-trained weights
vgg = VGG16(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)
# freeze the layers
for layer in vgg.layers:
layer.trainable = False
x = Flatten()(vgg.output)
#x = Dropout(0.5)(x)
#x = Dense(256, activation='relu')(x)
# adding output layer with number of classes = len(folders)
# it is a dense (fully connected) layer with a softmax activation function
prediction = Dense(len(folders), activation='softmax')(x)
vgg_model = Model(inputs=vgg.input, outputs=prediction)
test_model = keras.models.load_model(
"./models/convnet_vgg_4_classes.keras")
test_loss, test_acc = test_model.evaluate(test_set)
print(f"Test accuracy: {test_acc:.3f}")
2023-07-31 22:30:02.862289: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/_0}}]]
2/2 [==============================] - 2s 1s/step - loss: 1.3956 - accuracy: 0.3443 Test accuracy: 0.344
resnet = ResNet50V2(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)
#freezing the layers
for layer in resnet.layers[:-1]:
layer.trainable = False
# defining the final layers of the model.
x = Flatten()(resnet.output)
prediction = Dense(len(folders), activation='softmax')(x)
model = Model(inputs=resnet.input, outputs=prediction)
# definfing the final layers of the model
x = inception.output
x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
x = Dropout(0.5)(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(len(folders), activation='softmax')(x)
model = Model(inputs=inception.input, outputs=predictions)
test_model = keras.models.load_model(
"./models/convnet_resnet_4_classes.keras")
test_loss, test_acc = test_model.evaluate(test_set)
print(f"Test accuracy: {test_acc:.3f}")
2023-07-31 22:32:57.575761: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/_0}}]]
2/2 [==============================] - 1s 548ms/step - loss: 8.8857 - accuracy: 0.3115 Test accuracy: 0.311
# definfing the final layers of the model
x = inception.output
x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
x = Dropout(0.5)(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(len(folders), activation='softmax')(x)
model = Model(inputs=inception.input, outputs=predictions)
test_model = keras.models.load_model(
"./models/convnet_inceptionResnet_4_classes.keras")
test_loss, test_acc = test_model.evaluate(test_set)
print(f"Test accuracy: {test_acc:.3f}")
2023-07-31 22:36:44.943435: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/_0}}]]
2/2 [==============================] - 3s 2s/step - loss: 1.3807 - accuracy: 0.3443 Test accuracy: 0.344
# creating a new sequential model using a pre-trained MobileNetV2
tf.keras.backend.clear_session()
model = Sequential([mnet,
GlobalAveragePooling2D(),
Dense(512, activation = "ReLU"),
BatchNormalization(),
Dropout(0.3),
Dense(128, activation = "ReLU"),
Dropout(0.1),
Dense(32, activation = "ReLU"),
Dropout(0.3),
Dense(4, activation = "sigmoid")])
model.layers[0].trainable = False
model.compile(loss="categorical_crossentropy", optimizer="Adam", metrics="accuracy")
model.summary()
Model: "sequential"
test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
print('Test Accuracy: ', round((test_accuracy * 100), 2), "%")
/var/folders/qn/c0ll_4m107d0w2b21m8wkgz80000gn/T/ipykernel_1296/1841467748.py:1: UserWarning: `Model.evaluate_generator` is deprecated and will be removed in a future version. Please use `Model.evaluate`, which supports generators. test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
2/2 [==============================] - 0s 189ms/step - loss: 1.4826 - accuracy: 0.2500 Test Accuracy: 25.0 %
efnb0 = efn.EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224,224,3), classes=n_classes)
model = Sequential()
model.add(efnb0)
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))
model.summary()
test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
print('Test Accuracy: ', round((test_accuracy * 100), 2), "%")
/var/folders/qn/c0ll_4m107d0w2b21m8wkgz80000gn/T/ipykernel_1296/1841467748.py:1: UserWarning: `Model.evaluate_generator` is deprecated and will be removed in a future version. Please use `Model.evaluate`, which supports generators. test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
2/2 [==============================] - 1s 421ms/step - loss: 1.3885 - accuracy: 0.3167 Test Accuracy: 31.67 %
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training)
# Train the model
model.train(data='/home/jorgeluisg/Documents/001_George_brown/DL_2/project/dataset', epochs=20, imgsz=64)
# Predict with the model
results = predict(source) # predict on an image
image 1/1 /home/jorgeluisg/Documents/001_George_brown/DL_2/project/data/test/punch/Holm-vs-Aldana-1024x682_jpg.rf.09bae4f158e1f8cbb74a4047dbdafe5b.jpg: 64x64 punch 0.73, kick 0.27, 2.0ms Speed: 0.7ms preprocess, 2.0ms inference, 0.1ms postprocess per image at shape (1, 3, 64, 64)
#Load the VGG16 model with pre-trained weights
vgg = VGG16(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)
for layer in vgg.layers:
layer.trainable = False
# Adding custom layers on top of VGG16
x = Flatten()(vgg.output)
# adding a dropout layer to prevent overfitting
x = Dropout(0.5)(x)
# adding output layer with number of classes = len(folders)
# it is a dense (fully connected) layer with a softmax activation function
prediction = Dense(len(folders), activation='softmax')(x)
# taking the VGG16 model's input and connecting it to the custom layers added earlier for prediction.
vgg_model = Model(inputs=vgg.input, outputs=prediction)
# load best model
test_model = keras.models.load_model(
"./models/convnet_with_just_vgg.keras")
# evaluate the model on the testset
test_loss, test_acc = test_model.evaluate(test_set)
print(f"Test accuracy: {test_acc:.3f}")
2023-08-01 11:20:21.574189: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/_0}}]]
2/2 [==============================] - 0s 98ms/step - loss: 0.6331 - accuracy: 0.6333 Test accuracy: 0.633
# defining the model
resnet = ResNet50V2(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)
for layer in resnet.layers:
layer.trainable = False
x = Flatten()(resnet.output)
prediction = Dense(len(folders), activation='softmax')(x)
resnet_model = Model(inputs=resnet.input, outputs=prediction)
# loading the model and displaying the accuracy on the test data
test_model = keras.models.load_model(
"./models/convnet_with_resnet.keras")
test_loss, test_acc = test_model.evaluate(test_set)
print(f"Test accuracy: {test_acc:.3f}")
2023-08-01 11:46:05.944602: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/_0}}]]
2/2 [==============================] - 1s 555ms/step - loss: 1.1269 - accuracy: 0.5667 Test accuracy: 0.567
# Defining the model
inception = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
for layer in inception.layers:
layer.trainable = False
x = inception.output
x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
x = Dropout(0.5)(x)
predictions = Dense(2, activation='softmax')(x)
inception_model = Model(inputs=inception.input, outputs=predictions)
test_model = keras.models.load_model(
"./models/convnet_with_inceptionResnet.keras")
test_loss, test_acc = test_model.evaluate(test_set)
print(f"Test accuracy: {test_acc:.3f}")
2023-08-01 12:29:18.248080: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/_0}}]]
2/2 [==============================] - 3s 1s/step - loss: 0.6795 - accuracy: 0.6000 Test accuracy: 0.600
tf.keras.backend.clear_session()
mnet = MobileNetV2(include_top = False, weights = "imagenet" ,input_shape=(224,224,3))
model = Sequential([mnet,
GlobalAveragePooling2D(),
Dense(512, activation = "ReLU"),
BatchNormalization(),
Dropout(0.3),
Dense(128, activation = "ReLU"),
Dropout(0.1),
Dense(32, activation = "ReLU"),
Dropout(0.3),
Dense(2, activation = "sigmoid")])
model.layers[0].trainable = False
model.compile(loss="categorical_crossentropy", optimizer="Adam", metrics="accuracy")
model.summary()
Model: "sequential"
test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
print('Test Accuracy: ', round((test_accuracy * 100), 2), "%")
/var/folders/qn/c0ll_4m107d0w2b21m8wkgz80000gn/T/ipykernel_1833/1841467748.py:1: UserWarning: `Model.evaluate_generator` is deprecated and will be removed in a future version. Please use `Model.evaluate`, which supports generators. test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
1/1 [==============================] - 0s 176ms/step - loss: 0.6556 - accuracy: 0.6562 Test Accuracy: 65.62 %
efnb0 = efn.EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224,224,3), classes=n_classes)
model = Sequential()
model.add(efnb0)
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))
model.summary()
test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
print('Test Accuracy: ', round((test_accuracy * 100), 2), "%")
/var/folders/qn/c0ll_4m107d0w2b21m8wkgz80000gn/T/ipykernel_1833/1841467748.py:1: UserWarning: `Model.evaluate_generator` is deprecated and will be removed in a future version. Please use `Model.evaluate`, which supports generators. test_loss, test_accuracy = model.evaluate_generator(generator = test_set, verbose = 1)
1/1 [==============================] - 0s 230ms/step - loss: 0.5937 - accuracy: 0.5312 Test Accuracy: 53.12 %
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training)
# Train the model
model.train(data='/home/jorgeluisg/Documents/001_George_brown/DL_2/project/data', epochs=20, imgsz=64)
# Predict with the model
results = predict(source) # predict on an image
image 1/1 /home/jorgeluisg/Documents/001_George_brown/DL_2/project/data/test/punch/Holm-vs-Aldana-1024x682_jpg.rf.09bae4f158e1f8cbb74a4047dbdafe5b.jpg: 64x64 punch 0.73, kick 0.27, 2.4ms Speed: 1.0ms preprocess, 2.4ms inference, 0.1ms postprocess per image at shape (1, 3, 64, 64)
| Model VS Accuracy | With 4 classes | With 2 classes |
|---|---|---|
| VGG 16 | 34.4% | 63.3% |
| ResNet50V2 | 31.1% | 56.7% |
| InceptionResNetV2 | 34.4% | 60% |
| MobileNETV2 | 25% | 65.62% |
| EfficientNET | 31.67% | 53.12% |
| YOLO V8 | 47.5% | 75% |
Through an exploration of prominent image classification models, including VGG, ResNet50, InceptionResNetV2, MobileNetV2, EfficientNetB0, and YOLOv8, we encountered a complex challenge driven by the intricate similarities among the four classes: kick, punch, kicknt, and punchnt. Surprisingly, introducing dropout hindered rather than enhanced accuracy, leading to a performance plateau of 30% despite data augmentation. We focused on kick and punch classes alone and it resulted in a remarkable 30% accuracy improvement across models. Further experimentation with YOLOv8n, comparing the original four-class dataset with a simplified two-class version, consistently demonstrated YOLOv8n's superior accuracy, cementing its role as the optimal solution for our specific task and validating our efforts.
Link to Streamlit app: https://kick-and-punch-classifier.streamlit.app/